Algorithms for Learning Regular Expressions
نویسنده
چکیده
We describe algorithms that directly infer regular expressions from positive data and characterize the regular language classes that can be learned this way.
منابع مشابه
Theory and Algorithms for Information Extraction and Classification in Textual Data Mining
Regular expressions can be used as patterns to extract features from semi-structured and narrative text [8]. For example, in police reports a suspect’s height might be recorded as “{CD} feet {CD} inches tall”, where {CD} is the part of speech tag for a numeric value. The result in [1] shows us that regular expressions could have higher performance than explicit expressions in some applications ...
متن کاملLearning Regular Languages via Alternating Automata
Nearly all algorithms for learning an unknown regular language, in particular the popular L∗ algorithm, yield deterministic finite automata. It was recently shown that the ideas of L∗ can be extended to yield non-deterministic automata, and that the respective learning algorithm, NL∗, outperforms L∗ on randomly generated regular expressions. We conjectured that this is due to the existential na...
متن کاملLearning Regular Expressions from Noisy Sequences
The presence of long gaps dramatically increases the difficulty of detecting and characterizing complex events hidden in long sequences. In order to cope with this problem, a learning algorithm based on an abstraction mechanism is proposed: it can infer the general model of complex events from a set of learning sequences. Events are described by means of regular expressions, and the abstraction...
متن کاملAlgorithms for learning regular expressions from positive data
Article history: Received 26 October 2007 Revised 5 December 2008 Available online 24 January 2009
متن کاملWiki Vandalysis - Wikipedia Vandalism Analysis - Lab Report for PAN at CLEF 2010
Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manuall...
متن کامل